{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2.2 多元变量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在考虑有 $K$ 个状态的问题。\n",
    "\n",
    "我们用一个 $K$ 维的向量 $(x_1, \\dots, x_K)$ 来表示这些状态，第 $k$ 个状态用 $x_k = 1, x_j=0, \\forall j \\neq k$ 表示。\n",
    "\n",
    "例如 $\\mathbf x=(0,0,1,0,0,0)^\\text{T}$ 表示 $K = 6$ 的第 $3$ 个状态。\n",
    "\n",
    "这些向量都满足 $\\sum_{k=1}^K x_k = 1$。\n",
    "\n",
    "假设 $x_k=1$ 的概率为 $\\mu_k$，则 $x$ 的分布为：\n",
    "\n",
    "$$\n",
    "p(\\mathbf x~|~\\mathbf\\mu)=\\prod_{k=1}^K \\mu_k^{x_k}\n",
    "$$\n",
    "\n",
    "其中 $\\mathbf\\mu=(\\mu_1, \\dots, \\mu_K)^\\text{T}$，且满足 $\\mu_k \\geq 0, \\sum_j \\mu_k = 1$。\n",
    "\n",
    "这可以看成是伯努利分布的一般化表示。为了验证它是概率分布，我们有\n",
    "\n",
    "$$\n",
    "\\sum_\\mathbf{x} p(\\mathbf x~|~\\mathbf\\mu) = \\sum_{k=1}^K \\mu_k = 1\n",
    "$$\n",
    "\n",
    "均值为\n",
    "\n",
    "$$\n",
    "\\mathbb E[\\mathbf{x|\\mu}] = \\sum_\\mathbf{x} \\mathbf{x} p(\\mathbf x~|~\\mathbf\\mu) =(\\mu_1, \\dots, \\mu_K)^\\text{T} = \\mathbf\\mu\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 最大似然估计"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "考虑一组 i.i.d. 的观测数据 $\\mathbf D=(\\mathbf x_1, \\dots, \\mathbf x_N)$，其似然函数为\n",
    "\n",
    "$$\n",
    "p(\\mathcal D|\\mathbf\\mu) = \\prod_{n=1}^N \\prod_{k=1}^K \\mu_k^{x_{nk}} = \\prod_{k=1}^K \\mu_k^{\\left(\\sum_n x_{nk}\\right)} = \\prod_{k=1}^K \\mu_k^{m_k}\n",
    "$$\n",
    "\n",
    "其中 \n",
    "\n",
    "$$\n",
    "m_k =\\sum_n x_{nk} \n",
    "$$\n",
    "\n",
    "表示出现 $x_k=1$ 的次数，这些也是这个分布的充分统计量。\n",
    "\n",
    "在 $\\sum_{k=1}^K \\mu_k = 1$ 的约束条件下，为了最大化对数似然，我们考虑它的拉格朗日函数：\n",
    "\n",
    "$$\n",
    "\\sum_{k=1}^K m_k\\ln \\mu_k + \\lambda (\\sum_{k=1}^K \\mu_k - 1)\n",
    "$$\n",
    "\n",
    "对 $\\mu_k$ 的偏导设为 $0$，有：\n",
    "\n",
    "$$\n",
    "\\mu_k = - \\frac{m_k}{\\lambda}\n",
    "$$\n",
    "\n",
    "带入约束条件得到：\n",
    "\n",
    "$$\n",
    "\\mu_k^{\\text{ML}} = \\frac{m_k}{N}\n",
    "$$\n",
    "\n",
    "即 $x_k=1$ 在观测数据中所占的比例。\n",
    "\n",
    "如果只考虑 $m_1,\\dots,m_K$，那么我们可以定义多项分布（`multinomial distribution`）：\n",
    "\n",
    "$$\n",
    "\\text{Mult}(m_1, m_2,\\dots,m_k|\\mathbf\\mu, N) = \\begin{pmatrix} N \\\\ m_1m_2\\dots m_k\\end{pmatrix} \\prod_{k=1}^K \\mu_k^{m_k}\n",
    "$$\n",
    "\n",
    "其中：\n",
    "\n",
    "$$\n",
    "\\begin{pmatrix} N \\\\ m_1m_2\\dots m_k\\end{pmatrix} \\equiv \\frac{N!}{m_1!m_2!\\dotsm_K!}\n",
    "$$\n",
    "\n",
    "以及\n",
    "\n",
    "$$\n",
    "N = \\sum_{k=1}^K m_k\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2.1 狄利克雷分布"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "与二元分布类似，我们要给多元分布引入一个先验分布。考虑多元分布的形式，为了满足共轭性，先验分布应该满足这样的形式：\n",
    "\n",
    "$$\n",
    "p(\\mathbf{\\mu|\\alpha}) \\propto \\prod_{k=1}^K \\mu_k^{\\alpha_k-1}\n",
    "$$\n",
    "\n",
    "其中 $0 \\leq \\mu_k \\leq 1, \\sum_k \\mu_k = 1$，$\\mathbf\\alpha=\\left(\\alpha_1,\\dots,\\alpha_K\\right)^{\\text T}$ 是先验分布的参数。\n",
    "\n",
    "事实上，$\\mu_k$ 的分布是一个 $K-1$ 的单纯形。\n",
    "\n",
    "归一化这个分布，我们可以得到\n",
    "\n",
    "$$\n",
    "\\text{Dir}(\\mathbf{\\mu|\\alpha}) = \\frac{\\Gamma(\\alpha_0)}{\\Gamma(\\alpha_1)\\dots\\Gamma(\\alpha_K)} \\prod_{k=1}^K\\mu_k^{\\alpha_k-1}\n",
    "$$\n",
    "\n",
    "其中 \n",
    "\n",
    "$$\n",
    "\\alpha_0 = \\sum_{k=1}^K \\alpha_k\n",
    "$$\n",
    "\n",
    "这个分布叫做狄利克雷分布（`Dirichlet distribution`）。\n",
    "\n",
    "使用狄利克雷分布作为先验，后验分布为：\n",
    "\n",
    "$$\n",
    "p(\\mathbf \\mu|\\mathcal D,\\mathbf \\alpha) \n",
    "\\propto p(\\mathcal D|\\mathbf \\alpha) p(\\mathbf{\\mu|\\alpha})\n",
    "\\propto \\prod_{k=1}^K \\mu_k^{\\alpha_k + m_k - 1}\n",
    "$$\n",
    "\n",
    "由共轭性，我们可以得到后验分布也是个狄利克雷分布：\n",
    "\n",
    "$$\n",
    "p(\\mathbf \\mu|\\mathcal D,\\mathbf \\alpha)\n",
    "= \\text{Dir}(\\mathbf{\\mu|\\alpha+m})\n",
    "= \\frac{\\Gamma(\\alpha_0+ N)}{\\Gamma(\\alpha_1+m_1)\\dots\\Gamma(\\alpha_K+m_K)} \\prod_{k=1}^K \\mu_k^{\\alpha_k+m_k-1}\n",
    "$$\n",
    "\n",
    "其中 $\\mathbf m=\\left(m_1,\\dots,m_K\\right)^{\\text T}$。\n",
    "\n",
    "与二项分布类似，我们也可以将 $a_k$ 看成 $x_k=1$ 的一个有效观测次数。\n",
    "\n",
    "二项分布可以看出是多项分布 $K = 2$ 的特殊情况。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
